Thresholding Classifiers to Maximize F1 Score
نویسندگان
چکیده
This paper investigates the properties of the widely-utilized F1 metric as used to evaluate the performance of multi-label classifiers. We show that given an uninformative binary classifier, F1-optimal thresholding is to predict all instances positive. More surprisingly, we prove a relationship between the optimal threshold and the best achievable F1 score over all thresholds. We demonstrate that macroaveraged F1, a commonly used multi-label performance metric, can conceal this extreme thresholding behavior. Finally, based on these properties of F1, we suggest average skill score as an alternative to macro-averaged F1 for multi-label classification.
منابع مشابه
Optimal Thresholding of Classifiers to Maximize F1 Measure
This paper provides new insight into maximizing F1 measures in the context of binary classification and also in the context of multilabel classification. The harmonic mean of precision and recall, the F1 measure is widely used to evaluate the success of a binary classifier when one class is rare. Micro average, macro average, and per instance average F1 measures are used in multilabel classific...
متن کاملMulti-label Text Categorization with Model Combination based on F1-score Maximization
Text categorization is a fundamental task in natural language processing, and is generally defined as a multi-label categorization problem, where each text document is assigned to one or more categories. We focus on providing good statistical classifiers with a generalization ability for multi-label categorization and present a classifier design method based on model combination and F1-score ma...
متن کاملExtreme F-measure Maximization using Sparse Probability Estimates
We consider the problem of (macro) F-measure maximization in the context of extreme multi-label classification (XMLC), i.e., multi-label classification with extremely large label spaces. We investigate several approaches based on recent results on the maximization of complex performance measures in binary classification. According to these results, the F-measure can be maximized by properly thr...
متن کاملLearning Classifiers from Imbalanced, Only Positive and Unlabeled Data Sets
In this report, I presented my results to the tasks of 2008 UC San Diego Data Mining Contest. This contest consists of two classification tasks based on data from scientific experiment. The first task is a binary classification task which is to maximize accuracy of classification on an evenly-distributed test data set, given a fully labeled imbalanced training data set. The second task is also ...
متن کاملAutomatic Assignment of Non-Leaf MeSH Terms to Biomedical Articles
Assigning labels from a hierarchical vocabulary is a well known special case of multi-label classification, often modeled to maximize micro F1-score. However, building accurate binary classifiers for poorly performing labels in the hierarchy can improve both micro and macro F1-scores. In this paper, we propose and evaluate classification strategies involving descendant node instances to build b...
متن کامل